Ranking with submodular functions on a budget

Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {MSR}$$\end{document}MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {MSR}$$\end{document}MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.


Introduction
Combinatorial optimization plays a central role in many machine-learning problems. One prevalent approach to solve such problems is via submodular-optimization techniques. The popularity of submodular-optimization methods results from the fact that in many real-world settings the objective function exhibits the "diminishing returns" property, as well as from the ever-growing rich toolkit that has been developed in the past decades. One fundamental primitive in this toolkit is submodular maximization (Krause and Golovin 2014), which has been the backbone of a number of important problems, such as sensor placement (Krause et al. 2008), viral marketing in social networks (Kempe et al. 2015), document summarization (Lin and Bilmes 2011), and more. Submodular optimization has mainly been studied in the context of subset-selection problems. However, in many real-world applications the goal is to find a ranking over a set of items. Finding a ranking is a significantly more challenging task than subset selection, as the search space is factorially larger. One successful attempt of applying ideas from submodular optimization to ranking is the submodular-ranking problem (SR) (Azar and Gamzu 2011). In this problem, given a set of items and a set of submodular functions, the goal is to find a (partial) ranking of the items so as to minimize the average "cover time" of all functions.
An exemplary application of SR is in the multiple intents re-ranking problem (Azar et al. 2009), which has applications in web searching. In this problem setting, a user query may correspond to multiple user intents. For example, a query of "java" may mean a programming language, an island, or a type of coffee. Even for a seemingly unambiguous query, such as "New York," there exist many possible intents, for example, attractions, cuisine, travel, cultural events, etc. In the absence of an explicit user intent, we need to consider all possibilities. The SR formulation proposes to model each intent as a submodular function, whose value improves when a non-redundant web page of the right intent is encountered, and reaches a maximum when the user is satisfied, i.e., having gathered sufficient information. The goal is to produce a ranking of web pages that minimizes the expected number of pages a user has to browse before they satisfy their information needs. The expectation here is over the distribution of different user intents, which for this particular application can be assumed to be known.
While the SR formulation can be useful in some cases, it fails to model realistically a number of other applications. Critically, it assumes that a demand can wait indefinitely before it gets satisfied. In the previous example, for instance, it is assumed that users will keep reading down a ranked list of web pages until they gather enough information. In reality, a budget can be set for the amount of service that a user receives. The budget can be the number of web pages to browse, or the time to spend on the web-search task. A user stops receiving service once the budget is exceeded. Moreover, the budget can vary across different demands. For example, a user intent can be classified into one of three types, informational, navigational, and transactional (Jansen et al. 2008), and each may come with a different budget, translating to the amount of "patience" that a user exhibit to obtain results for each type. User intents and budgets can be readily extracted from the past search logs.
To accommodate budgeted versions of the submodular ranking problem, we propose a new formulation, which we call max-submodular ranking (MSR). In the MSR problem, we are given a set of non-decreasing submodular functions, each associated with a budget. We aim to find a ranking that, instead of minimizing the total coverage time of the functions, maximizes the sum of function values (coverage) under individual budget constraints. In other words, every item in the ranking incurs a cost, and each function is evaluated at the maximal prefix of the ranked sequence that does not exceed its budget. A precise formulation of the MSR problem is provided in Sect. 3.
In this paper, we propose practical algorithms with approximation guarantees for MSR, when the budget constraints are either cardinality or knapsack constraints. We also note that the well-known constrained submodular maximization and minimum submodular cover problems are special cases of MSR and SR, respectively, when there is a single submodular function. In this sense, the MSR problem we define is a dual problem of SR, in the same way that max k-cover is a dual problem of minimum set cover.
MSR has great potential to be applied in other scenarios, such as in the case where the submodular functions are 0-1 activation functions. We call this special case max-activation ranking (MAR) problem. The idea is to activate as many demands as possible with a common ranking of items, or services, under individual budget constraints. As an example, some subscription-based streaming media services, such as Netflix, produce content in a data-driven fashion. One possibility is to arrange the plot structure in a TV series such that the maximum number of audience will get interested before their individual cut-off points for a new show. The goal for the TV series producer is to encourage the maximum-size audience to continue watching. A plot structure can be characterized as a sequence of scenes, each described by a set of tags, such as romantic, adventurous, funny, etc., which may interest particular audience. Similar applications can also be found in ranking commercial ads, ranking customer reviews, creating play lists for music streaming services, and more.
In concrete, our contributions in this paper are summarized as follows.
-We introduce the novel problem of max-submodular ranking (MSR), where the goal is to find a ranking of a set of items so as to maximize the total value of a set of submodular functions under budget constraints. -We prove that a simple greedy algorithm achieves a factor-2 approximation for the MSR problem under cardinality constraints, which is tight for this particular greedy algorithm. -We show that a weighted greedy algorithm that pays more attention to functions with small budget achieves a factor-3 approximation for the MSR problem under cardinality constraints. While its worst-case bound is worse, there are natural problem instances for which the weighted greedy finds better solutions than its unweighted counterpart. -We devise a new algorithm that returns the best solution among the solutions found by a cost-efficient greedy algorithm and a ranking of "large" items produced by dynamic programming. Our algorithm achieves an approximation factor arbitrarily close to 4 for the MSR problem under knapsack constraints. -We empirically evaluate and compare different algorithms on real-life datasets, and find that the proposed algorithms achieve superior performance when compared with strong baselines.
The rest of the paper is organized as follows. We start by discussing the related work in Sect. 2, and we formally introduce the MSR problem in Sect. 3. The unweighted and weighted greedy algorithms for MSR under cardinality constraints are presented and analyzed in Sects. 4.1 and 4.2, respectively. The novel algorithm for the MSR problem under knapsack constraints is introduced and analyzed in Sect. 5. We present our empirical evaluation in Sect. 6, and we offer our concluding remarks in Sect. 7.

Submodular maximization
Submodular maximization is a special case of our formulation when given only a single function. Coupled with a non-decreasing property and with a cardinality constraint it is well-known that a simple greedy algorithm achieves a e/(e − 1) approximation , which is also shown to be tight . For a more general budget constraint, a natural algorithm is to return the best solution among the solutions found by a cost-efficient greedy method and by selecting the best singleton item. Recently, the approximation factor of this "best-oftwo" algorithm was shown to be within [1/0.462, 1/0.427] (Feldman et al. 2020). A better 2-approximation is achieved by another greedy variant that returns the best solution among the solutions found by a cost-efficient greedy algorithm and all its intermediate solutions, each augmented with the best single additional item (Yaroslavtsev et al. 2020).

Submodularity for a sequence function
A sequential utility function is defined as f : S → R, where S is the set of all possible sequences of subsets of a ground set of items V . Note that a set function can be seen as a special sequence function, in which the diminishing-returns effect holds for any subsequence relation. Streeter and Golovin (2008) and Zhang et al. (2012) introduce a notion of string submodularity, which restricts the diminishing returns to only the prefix subsequence relation. That is to say, a function f is string submodular if appending an item to a sequence results in no larger marginal gain than appending the item to a prefix of the sequence. The goal is to find a sequence of a given length that maximizes the value of the function f . In our formulation, the sum of multiple submodular functions remains submodular, and thus, string submodular. However, the analysis in the prior work does not apply in our case as we assume that each submodular function is associated with a different budget constraint. Azar and Gamzu (2011) propose the submodular ranking (SR) problem, which aims to find a permutation to minimize the average "cover time" of a set of submodular functions, where we say that an input sequence "covers" a function if it evaluates to the maximum value of the function, and the "cover time" of a sequence of items is the shortest prefix of the sequence for which the function is covered. The problem we study in this paper can be seen as a dual problem of the SR problem. The SR problem originates from the classic min-sum set cover (MSSC) problem (Feige et al. 2004) and its generalizations (Azar et al. 2009;Gamzu 2010).

Diversified web search
In web search, in the absence of the explicit user intent, it is desirable to provide a sequence of high-quality and diverse documents that account for the interests of the overall user population. Typically, the diversity is evaluated by the coverage at the topical level of some existing taxonomy (Zhai et al. 2015). Carbonell and Goldstein (1998) propose a greedy algorithm with respect to maximal marginal relevance (MMR) to reduce the redundancy among returned documents. Bansal et al. (2010) define the problem of finding an ordering of search results that maximizes the discounted cumulative gain (DCG), i.e., the sum of discounted gains of different user types, where the discount factor increases if a user type is satisfied later on. They show that, in some special cases, the DCG metric can be rewritten as a weighted sum of submodular functions. Our framework contributes to this theme by, for example, casting each user type or topic as a submodular function.

Problem definition
We are given a universe set V with |V | = n items, a set of m non-decreasing submodular functions F = { f 1 , . . . , f m }, and a cost function c : V → R + . Recall that a set function f : Let σ (V ) denote the set of permutations of V , that is, σ (V ) = {π : V → V | π is a permutation}. Our goal is to find a permutation π ∈ σ (V ) to maximize the sum of function values f i (π i ), where the input set π i is a prefix of the sought permutation π with feasibility constraints. In particular, we consider that each function f i receives as input the maximal prefix of π that fits within its corresponding budget b i . In other words, the permutation π can be seen as a sequence of nested sets, one for each function. Formally, the max-submodular ranking (MSR) problem that we study in this paper is defined as follows. Problem 1 (Max-submodular ranking (MSR)) Given a set of items V , a set of nondecreasing and submodular functions F = { f 1 , . . . , f m }, a cost function c : V → R + , and non-negative budgets b i for each function f i , the MSR problem aims to find a permutation π ∈ σ (V ) that maximizes the sum where π j is the prefix of the permutation π of length j and c(π j ) = v∈π j c(v).
We make a number of observations for Problem 1. Without loss of generality, we can assume that f i (∅) = 0; otherwise we can translate the objective function by f i ∈F f i (∅).
Also note that not all items in the permutation solution π will necessarily be used as an input to some function f i ∈ F. Instead, only the items in π i for the largest i will be used. For this reason, we can think that the output to the MSR problem is a partial permutation; after all functions deplete their budget, the remaining items of the permutation does not matter.
Finally, note that when the cost function c is uniform, i.e., c(·) = 1, we can consider only integral budget b i and assume i = b i .
With respect to the hardness of approximation of the MSR problem, we observe that MSR is equivalent to the standard submodularity-maximization problem when m = 1, that is, when there is only one function in F. A second reduction from the standard submodularity-maximization problem can be obtained by letting b i = b, for all i = 1, . . . , m, i.e., when the same budget is used for all functions. The reason is that in this case the sum of submodular functions remains submodular, and we ask to maximize a submodular function under a cardinality constraint. We conclude the following hardness result.
Remark 1 . For solving the max-submodular ranking (MSR) problem, no algorithm requiring a polynomial number of function evaluations can achieve a better approximation guarantee than e/(e − 1).
It is also well-known that maximum k-cover, a special case of submodular maximization, is a dual problem to the minimum set cover problem, where the constraint in one problem is treated as the objective function in the other (Feige 1998). More generally, the MSR problem can be considered as the dual problem to the submodularranking problem (SR) (Azar and Gamzu 2011), whose goal is to find a (partial) ranking of the items so as to minimize the average "cover time" of all functions.
We conclude the section by introducing some additional notation that will be used in our analysis. The optimal permutation is denoted by π * . We use the operator ⊕ to denote sequence concatenation and overload operator ⊆ for subsequence relation.

Cardinality constraints
We start our analysis of the MSR problem for the case of cardinality constraints, that is, when the item costs are uniform (c(·) = 1). For this particular case we present two algorithms, called Greedy-U and Greedy-W, both having provable guarantees. Both algorithms generate a permutation by greedily selecting one item before the next. Pseudocode for both algorithms is shown in a unified manner in Algorithm 1. The difference in the two algorithms lies in adopting different coefficients α i , associated with the submodular functions f i , in their selection criteria. The first algorithm, Greedy-U, is an unweighted greedy (α i = 1) with respect to the submodular functions f i . The second algorithm, Greedy-W, is a weighted greedy (α i = 1/b i ) that puts more weight on functions with smaller budget.

Input: An instance of MSR and weights α
The worst-case running time of both algorithms is O(n 2 m). In practice, they run much faster and their actual running time grows almost linearly in n, thanks to applying a standard lazy evaluation technique (Leskovec et al. 2007). More details on scalability are discussed in Sect. 6.4.

Unweighted greedy
We show that the unweighted greedy algorithm (α i = 1) achieves a 2-approximation guarantee for the MSR problem with uniform cost. In addition, we show that the approximation ratio is tight for this particular algorithm.
By the greedy selection criteria, we get that for arbitrary item v ∈ V in the j-th iteration it holds that The main idea of the proof is to choose an appropriate item v for the above inequality at each iteration of the greedy, and sum over all iterations. We denote the j-th item of the optimal permutation π * by v * j . We write ALG to denote the value achieved by the Greedy-U algorithm. Then Consequently, 2ALG ≥ OPT, proving the claim.
We complete the analysis of the Greedy-U algorithm for the MSR variant with cardinality constraints, by showing that the approximation ratio 2 is tight.
Remark 2 Greedy-U (Algorithm 1 with coefficients α i = 1) cannot do better than 2-approximation for the MSR problem with uniform item costs (c(·) = 1).

Proof
We construct an instance where the algorithm returns ALG = 1 2 OPT. The main idea is to force the algorithm to pick up items that are only beneficial to functions with large budget and "starve" those with small budget in the early iterations. Consider Clearly every f i is non-decreasing and submodular. One possible optimal permutation is π * = (v 1 , . . . , v n ), which leads to OPT = m. Algorithm 1 with coefficient α i = 1 returns a permutation (out of many equivalent possible permutations) π = (v n , . . . , v 1 ) with ALG = (1 + )m/2. By letting be arbitrarily small, we see that the bound in Theorem 1 is tight.

Weighted greedy
Inspired by the instance that yields the tight bound in Remark 2, it is reasonable to let the algorithm favor functions with small budget at the early iterations. Such a strategy is desirable as it in some sense suggests fairness in resource allocation, i.e., more functions can afford at least one item from the returned ranking. It also turns out to have better performance in experiments. We show that such a strategy is indeed reliable by proving a constant-factor approximation guarantee.
By the greedy selection criteria, we know that for an arbitrary item v ∈ V it holds that We denote by v * j the j-th item of the optimal permutation π * . The idea is to replace the arbitrary item v with v * k ∈ π * and compute a weighted sum. In order to define the weights, given k < j, we write d jk = 1/2, and d j j = ( j + 1)/2. Immediately, We will denote the left hand side of the above equation by LHS, and the right hand side by RHS. We will first bound the RHS. In order to do so, we need an additional bound on the weights d jk , namely, for any fixed k, We can now bound the right hand side with Now we consider the left hand side, Putting everything together, ALG ≥ LHS ≥ RHS ≥ (OPT − ALG)/2, and we obtain 3ALG ≥ OPT.

Knapsack constraints
The traditional way of handling knapsack constraints is to adopt a cost-efficient variant of the greedy algorithm where in each iteration we select the item with the largest ratio between utility and cost. Furthermore, we compute a second solution by selecting the maximum-utility singleton item that is feasible. The idea is to use the second solution to rescue the situation in which the greedy algorithm starts with some cost-efficient small items and then is "starved" (i.e., the remaining budget is not enough to admit another valuable large item). This idea however falls short when it comes to the MSR problem. The reason is that there are multiple knapsacks and each one of them may be "starved" by different big items. A more sophisticated way is needed to compute an alternative second solution.
We now discuss our proposed method in more detail. First, an item v ∈ V is called large with respect to a function f i ∈ F if its cost is more than half of the budget b i , that is, 2c(v) > b i . It is obvious that a function f i can afford at most one large item. The following variant of the MSR problem targets a similar objective to that of MSR, but exclusive to only large items.
Problem 2 (Max-submodular ranking of large items (MSRL)) Given a set of items V , a set of non-decreasing and submodular functions F = { f 1 , . . . , f m }, a cost function c : V → R + , and non-negative budgets b i for each function f i , the MSRL problem aims to find a permutation π ∈ σ (V ) that maximizes where F(v j ; π) is the set of functions that take the j-th item v j ∈ π as a large item, i.e., F(v j ; π) = { f i ∈ F : 2c(v j ) > b i , c(π j ) ≤ b i }, and z(v j , c) is defined to be the contribution of item v j by appending it to a prefix with cost c.
We start by proving that the cost-efficient greedy algorithm yields a 3-approximation when there is no large item in π * . Next, we devise a dynamic programming (DP) algorithm in Algorithm 2 to approximately solve MSRL. Finally, we prove that the best solution among the greedy solution and the DP solution can achieve an approximation guarantee that is arbitrarily close to 4.
Step 1: bounding small items in π * . We first discuss the case in the absence of large items in π * . Let us introduce some notation. We denote the j-th selected item by our algorithm by u j . We denote the k-th item of the optimal permutation π * by v * k . We denote the greedy solution of Algorithm 1 with coefficient α i = 1 by ALG 1 and the DP solution of Algorithm 2 by ALG 2 .
The next theorem shows that, if every function f i includes no such large item in π * , ALG 1 ensures a constant-factor guarantee. Otherwise, we have an additional term z(π * ), which we will bound later.
The proof relies on the next technical observation.

Proof of of Theorem 3 Write
By greedy, we know that for arbitrary item v ∈ V in the j-th iteration it holds that To simplify the notation used in the above inequality, let us define X j = {i ∈ [m] | c(π j ) ≤ b i } to be the valid function indices for π j , and similarly We will start by lower bounding ALG 1 with Let us denote the right hand side with C. We will prove the theorem by showing that C ≥ (OPT − ALG 1 − z(π * ))/2.
Step 2: bounding large items in π * . When some functions do take large items in OPT, the quantity z(π * ) is positive, and we need to bound it. We will do this by solving approximately the MSRL problem.
Our first result allows to order items based on their cost when solving MSRL.
Theorem 4 Assume a permutation π with some item v i for which there is an index The proof relies on the following technical observation.

Observation 2 Given an item v and two sequences π, π with costs c(π
Consequently, we have proving the claim.

Proof of Theorem 4
Let v i be an item that is in π but not in π . Assume that 2c following the assumptions of the theorem. Consequently, F(v; π i ) = ∅ and z(v i , c(π i−1 )) = 0. Let u j be the j-th item in π . Observation 2 now implies that proving the claim.
The above theorem enables a way to limit ourselves to sequences of large items with non-decreasing costs when solving MSRL.
Let us assume for simplicity that z(·) is an integer-value in [k]. We will discuss how to relax this assumption shortly. We can solve MSRL by constructing a table T with entry T (a, j) for each value a ∈ [k] and each item with index j ∈ [n]. We define the entry T (a, j) to be the lowest possible cost of a permutation using only the first j items with at least value a, Note that it is also possible to solve MSRL by defining a different dual DP, where each entry T (b, j) contains the highest value realizable by a permutation using only the first j items with at most cost b. However, this dual DP is not amenable to the standard rounding trick we will introduce shortly.

Theorem 5
The table T satisfies the following relation: Proof We will prove by induction. The result holds trivially for T (a, 1). Next, we assume the theorem holds for all T (a , j − 1). Now we examine T (a, j). Let π be a sequence responsible for T (a, j). Let X be the value of the right hand side of Equation 9. Clearly, we have X ≥ c(π ), and we now prove the claim by showing that X ≤ c(π ).
If v j not in π , then X ≤ T (a, j − 1) ≤ c(π ), and we are done. If v j is in π , then let π be the permutation without v j . Let a = z(π ), and by the inductive hypothesis, we know that T (a , j − 1) ≤ c(π ). Then a ≤ z(π ) = a + z(v j ; c(π )) ≤ a + z(v j ; T (a , j − 1)), where the last inequality is by Observation 2. Therefore, according to the DP updating rule, we have completing the proof.
We can use Theorem 5 to construct T using a dynamic program, which is described in Algorithm 2. Next, we will show that the DP solves the MSRL problem.

Theorem 6 Assume that z(π ) is an integer in [k]
for every π . The permutation π responsible for T (a * , n), where a * = max{a | T (a, n) < ∞}, returned by Algorithm 2 has the largest z(·) value. Besides, Algorithm 2 runs in O(n(k + m) + m log m) time.
Proof The correctness of the algorithm follows directly from Theorem 5. There are in total k × n table entries. Note that we can avoid directly invoking z(v j ; ·), which alone needs time O(m), by sorting f i by their budget b i and gradually including more f i as c (T (a, j − 1)) and a decrease. This leads to an additional O(m) time per index j.
It is easy to see that both the cost-efficient greedy algorithm and the best singleton will pick item v 2 , which leads to a sub-optimal ranking, while the DP algorithm can help us find the optimal ranking.
The DP algorithm first initializes T (a, j) ← ∞ for all a and j. We then process items v 1 , v 2 , v 3 in non-decreasing order by their costs.
-Item v 1 : we set T (a, 1) = c(v 1 ) for all 0 < a ≤ f 1 (v 1 ) and T (0, 1) = 0. -Item v 2 : we set T (a, 2) = T (a, 1) for all a ≤ f 1 (v 1 ), and T (a, 2) = c(v 2 ) for all Finally, we return the permutation π = (v 1 , v 3 ) responsible for T (a * , 3), where So far we have assumed that z is an integer. Next, we show that with a standard rounding technique, the DP method in Algorithm 2 gives an FPTAS for MSRL. The idea is to apply the DP to a rounded instance, which is obtained by first scaling and rounding down every function f i /K for certain K .
where v is a large item for f i . Let K = P m for any constant > 0. Define f i = f i /K and let z (π ) be the score of a permutation using f i instead of f i . Let π be the permutation with the largest z(π ). Then K z (π ) ≥ (1 − )z(π ).
Proof Due to scaling and rounding down we have f i (v) − K f i (v) ≤ K . Since there can be at most one large item per function, and the score z contains at most m functions, thus, z(π ) − K z (π ) ≤ m K = P ≤ z(π ).
Proof Let π be the permutation with the largest z and let π be the permutation with the largest z . Then z(π ) ≥ K z (π ) ≥ K z (π ) ≥ (1− )z(π ), proving the approximation guarantee.
To prove the running time note that z(·) ≤ m P and z (·) ≤ m P/K = m 2 / . Theorem 6 proves the claim.
We are finally ready to state our main result for MSR with non-uniform cost.

Experimental evaluation
In this section, we evaluate the performance of the proposed algorithms on real-world datasets. We first discuss our experimental evaluation for a playlist-making use-case. We model this use-case using the max-activation ranking (MAR) problem, which is a special case of the MSR problem when the submodular functions f i are 0-1 functions. We then conduct two experiments for the MSR problem: (i) multiple intents re-ranking and (ii) sequential active learning. Finally, we evaluate the running time of our methods. Statistics of the datasets used in the experiments are summarized in Table 1. Our implementation and pre-processing scripts can be found in a Github repository. 1

Proposed methods and baselines
The proposed greedy algorithms are denoted by Greedy-U and Greedy-W; as discussed in Sect. 4. The proposed dynamic program is denoted by DP. As baselines we use the following algorithms.
-The greedy algorithm for the SR problem (Azar and Gamzu 2011), which favors functions near completion. We refer to this baseline as AG. -When only the minimum budget among all functions is considered, the objective is a submodular function as a whole. We then consider the well-known "bestof-two" algorithm that returns the best solution among the solutions found by a cost-efficient greedy method and by selecting the best singleton item. We refer to this baseline as Subm. -A simple ranking method (Quality) that orders individual items in non-increasing quality. -A random ranking algorithm (Random). Note that in general, computing the optimal solution requires enumerating all sequences of length equal to the maximum budget, which is computationally intractable even for a modest scenario with universe set |V | = 100 and budget b = 10.

Experiments with the max-activation ranking (MAR) problem
We evaluate our methods on three datasets, the Million Song dataset (Bertin-Mahieux et al. 2011), the MovieLens dataset (Harper and Konstan 2015), and the Amazon Review dataset on books category (Ni et al. 2019). The three datasets have similar format, where each record can be seen as a triple of user, item and rating. We describe our experimental evaluation for the first dataset, and the other two datasets are processed in the same way and give very similar results, as can be verified in Fig. 1.
In the Million Song dataset, each record is a triple representing a user, song and play count. We assume that a user likes a song if they play the song more than once. We investigate an instance of the MAR problem for the application scenario of creating a playlist. In particular, we want to find a ranking of songs that maximizes the number of users who like at least one song among songs they listen to. In this case, each user is modeled as a 0-1 activation function. We generate a random budget for each user, i.e., the maximum number of songs a user will listen to, from 1 to a given maximum budget. We also generate a random cost from 1 to 10 for each song in order to experiment with an additional non-uniform cost scenario.
The results of our evaluation are shown in Fig. 1. The error bars are over random user budgets and item costs. In the unit-cost scenario, the proposed Greedy-W algorithm is the best performing, closely followed by the proposed Greedy-U algorithm. The performance of the baselines is inferior, and one reason is that they fail to take into account the user budget. In the non-uniform cost scenario, the proposed Greedy-U algorithm obtains the best performance. Note that it is expected that DP has poor performance, as it is meant to help in extreme cases. Also note that DP does not scale for the book-list dataset-more details on scalability are discussed in Sect. 6.4. Interestingly, Greedy-W performs worse than AG, which indicates that a more sophisticated weighting scheme is needed to combine non-uniform budget and cost. Fig. 1 Results of using the MAR problem formulation for making a playlist of items. The goal is to maximize the number of activated users. The universe V includes songs, movies or books. A user (a 0-1 activation function f i ) is activated if they like at least one item among all items they consume within their budget. Markers are jittered horizontally to avoid overlap 6.3 Experiments with the max-submodular ranking (MSR) problem

Multiple intents re-ranking
We simulate a web-page ranking application for documents in the 20 Newsgroups dataset (Dua and Graff 2017). For each newsgroup, we treat its title as a query, and collect documents that contains the query. We extract 5 topics from the collected documents by means of LDA model (Blei et al. 2003). Subsequently, each topic (i.e., its top 20 keywords) is considered as a potential user intent, and the submodular utility for a particular topic when given a set of documents is the coverage rate of its top keywords. We aim to find a ranking of documents that maximize the total utility of all user intents. As in the previous experiment, we generate a random budget for each user intent, i.e., the maximum number of documents the potential user will read, from 1 to a given maximum budget. For an additional non-uniform cost scenario, we use the document length as the cost for reading a document, and accordingly multiply the budget by the average document length.
The results of our experiment are shown in Fig. 2, where we report the average performance across all newsgroups. In the unit-cost scenario, the top-contender algorithms have close performance. This is due to the overwhelming advantage of lengthy documents that contain more words and produce higher utility. In the more realistic (a) (b) Fig. 2 MSR for multiple intents re-ranking in web page ranking. The goal is to maximize the total utility of all user intents within their individual reading budget. The universe V includes documents. The utility of a user intent (a coverage function f i ) is represented by the coverage rate of its top keywords. Markers are jittered horizontally to avoid overlap non-uniform cost scenario, our algorithms, Greedy-U and Greedy-W, achieve the best performance. Quality algorithm behaves the worst as it fails to consider the cost of items, and its first-rank lengthy document exceeds the user budget most of the time.

Sequential active learning
Active learning seeks to make label queries on only a small number of informative data points in order to maximize model performance. In particular, for the k-nearest neighbors (kNN) model, an intuitive measure for informativeness of a set of labeled data points is the average distance from an unlabeled data point to its closest labeled point, i.e., the facility-location function (Wei et al. 2015). We refer to this average distance as the radius. Thus, the active-learning task can be naturally formulated as labeling a small subset of data to maximize the radius reduction. Note that the reduction of the radius by labeling a subset of data points is clearly non-decreasing and submodular. In our setting, we assume that we have access to multiple models that are trained on the same labeled data, and we aim to label data sequentially to maximize the total reduction in the radii among all models. This happens, for example, when each model runs on a different subset of features. Interestingly, in this case each model can be seen as a student with different learning capacity, and a teacher tries to optimize the classroom teaching by feeding them labeled data (Zhu et al. 2017). We evaluate the performance of active-learning kNNs (k = 1) with Euclidean distance in the Handwritten Digits dataset (Dua and Graff 2017). Each kNN model adopts a different strategy in unsupervised feature selection, such as variance thresholding, PCA, and feature agglomeration. Again, we generate a random query budget for each model and a random cost (from 1 to 10) for labeling each data point.
As we can see in Fig. 3, all greedy algorithms are very effective in reducing the radii. The correlation between the radius reduction and model accuracy (over testing data) is obvious. Note that the Random algorithm is a standard strong baseline in data subset selection, which is outperformed by the greedy algorithms by a large margin. The comparison becomes more evident in the non-uniform cost scenario, as the Random algorithm fails to take into account the item costs.

Running time
We examine the scalability of all methods by fixing either the number of users (i.e., functions) or the maximum budget (equal to the number of items), while varying the other. In Fig. 4 we demonstrate the running time of all algorithms for the task of making a synthetic playlist. In this case, we generate a dataset by assuming that each user likes a small random subset of items. We generate a random budget for each user, from 1 to the given maximum budget, and a random cost from 1 to 10 for each item. When comparing the running time, the Quality algorithm is a meaningful baseline, as it produces a ranking after a single evaluation on each item over all functions, i.e., O(max{n log(n), mn}). Its running time varies almost linearly as a function of the budget, which is in contrast to the behavior of the naïve greedy algorithms. Thanks to the lazy evaluation technique (Leskovec et al. 2007), the running time of all greedy algorithms actually grows nearly linearly in the budget. The AG algorithm is slower as it is subject to frequent function evaluations, because its greedy criterion depends on the current function values. The running time of the DP algorithm grows quadratically in the number of functions, which has difficulty in scaling to a very large number. On the other hand, it scales well in the number of items, and particularly, when the budget is big, it finishes quickly as there is no large item. The running time of all except for the Random algorithm grows linearly in the number of functions, which is inevitable if the utility of items is considered.

Conclusions
In this paper, we introduce a novel problem in the active area of submodular optimization. Our problem, max-submodular ranking (MSR), ask to find a ranking of items such that the sum of multiple budgeted submodular utility is maximized. The MSR problem has wide application in the ranking of web pages, ads, and other types of items. We propose several practical algorithms with approximation guarantees for the MSR problem, with either cardinality or knapsack budget constraints. We empirically demonstrate the superior performance of the proposed algorithms on real-life datasets, compared with a state-of-the-art baseline and other meaningful heuristics. One direction for future work is to narrow the gap between the approximation ratio and the lower bound. Another direction is to study the online version of the MSR problem, to allow for the arrival of new submodular functions. Other potential directions include imposing a more general constraint for each submodular function and experimenting with new applications.